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DATA  MODELING  USING  QUANTILE  AND  '■ 
DENSITY -QUANTILE  FUNCTIONS 


by  Emanuel  Parzen 

Texas  A&M  University 
Institute  of  Statistics 

Abstract 


Statistical  data  modeling  is  a  field  of  statistical 
reasoning  that  seeks  to  fit  models  to  data  without  using 
models  based  on  prior  theory;  rather  one  seeks  to  learn  the 
model  by  a  process  which  could  be  called  statistical  model 

J  1  /  .  •  ' 

identification.  When  analyzing  a  sample  X^} . X^j, 

statisticians  should  not  confine  themselves  to  either  fitting 
a  Gaussian  distribution,  or  transforming  the  data  to  be 
Gaussian.  Such  an  approach  ignores  the  importanceof  bimodality 
as  a  feature  of  observed  data,  and  also  ignores  the  need  to  fit 
to  data  probability  model  based  distributions  which  could  sug¬ 
gest  probability  models  for  the  causes  generating  the  data. 

This  paper  describes  an  approach  to  statistical  data  modeling 
which  emphasizes  estimation  of  quantile  and  density-quantile 
functions;  it  treats  the  Gaussian  distribution  as  just  one  of 
the  available  distributions. 
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Sections  1-3  introduce  the  role  of  quantile  functions  in 
statistical  modeling,  the  sample  quantile  function,  and  location 
and  scale  parameter  models.  Quantile  function  based 
descriptors  of  a  probability  distribution  are  defined 
(Section  4) .  Section  5  defines  quantile  box  plots  and  trans¬ 
formation  distribution  functions;  an  example  of  their  appli¬ 
cation  is  discussed  in  Section  6.  A  quantile  version  of  "boot¬ 
strap"  simulation  methods  is  outlined  in  Section  7.  Data  summary 
by  a  few  values  of  the  sample  quantile  function  is  discussed 
(Section  9) .  Section  8  discusses  quantile  function  formulations 
of  robust  estimators  of  location  and  scale. 

The  concepts  discussed  in  this  paper  are  best  summarized 
by  a  list  of  some  of  the  terminology  defined:  quantile  function, 
density-quantile  function,  score  function,  sample  quantile 
function,  sample  quantile-density  function,  histogram-quantile 
function,  sample  entropy,  score  deviation,  tail  exponents,  mode 
percentile,  quantile  box  plot,  cumulative  weighted  spacings 
plot,  quantile  bootstrap,  minimum  residual  score  deviation 
estimation,  and  19  quantile  values  for  universal  data  summary. 


1-  Some  Basic  Concepts  of  Statistical  Modeling  and  Estimation 
One  of  the  basic  problems  of  statistical  data  analysis  is 
the  one -sample  problem:  given  a  sample  X±,  ....  Xn  which  we 
assume  initially  to  be  independent  observations  of  a  population 
characteristic  represented  by  a  random  variable  X,  we  would  like 
to  infer  the  probability  distribution  of  X. 

The  probability  distribution  of  X  is  usually  represented 
by  its  distribution  function 

F (x)  =  Pr  [X  <  x] 

and  by  its  probability  density  function 
f  (x)  =  F'  (x)  . 

In  this  paper  we  assume  X  is  continuous  and  possesses  a  proba¬ 
bility  density  function. 

The  problem  of  statistical  inference  is  often  defined  to 
ke  parameter  estimation ;  then  one  assumes  that  the  true  proba¬ 
bility  density  function  f(x)  belongs  to  a  family  of  functions 
f0(x)  indexed  by  a  vector  0  of  parameters  0^,  ....  0r  . 

The  maximum  likelihood  estimator  of  0  is  defined  to  be  a  function 
0  of  Xlt  Xn  satisfying  L(0)  =  m0x  L(0)  ,  defining 

L(0)  =  f0(X1(  ...  Xn)  =  f 9 (Xj )  ; 

L(6)  is  the  joint  probability  density  of  the  observed  data 
when  0  is  the  true  parameter  value . 
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Maximum  likelihood  estimation  is  not  a  principle  to  be 
accepted  uncritically;  statisticians  delight  in  constructing 
examples  in  which  it  leads  to  unbelievable  conclusions.  To 
understand  when  and  why  maximum  likelihood  estimation  works, 
we  have  to  introduce  empirical  distribution  function  (EDF) 
F(x)  defined  by 


F (x)  =  fraction  of  . Xn  <  x 

To  graph  F(x)  ,  one  determines  the  order  statistics 

Xq)  £  ^(2)  1  <  •  •  •  *  *(n)  are  the  sample  values  (assumed 

to  be  distinct)  arranged  in  increasing  order;  then 


F(x)  = 

Li 


X(j)  1  X  <  X(j+1) 


j  *  0,  1,  2, 


where  X(Q)  =  —  and  X(n+1)  =  -  . 

The  concept  of  likelihood  is  now  defined  as  average  log 
likelihood 

Ve>  -  5  l°8  ”,  £0<Xj> 

J=1  J 

*  5  jlj  l0*  W 

=  /  log  fe(x)  dF(x) 


One  can  regard  Ln(9)  as  a  measure  of  "distance"  between  the 
data  represented  by  F,  and  the  model  represented  by  f0(x)  • 
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Another  important  interpretation  of  Ln(9)  is  an  estimator 
of  a  "distance"  between  the  true  probability  density  f(x)  and 
the  model  f.(x)  .  An  important  role  in  the  theory  of  statis- 
tical  inference  is  played  by  the  Kullback-Liebler  information 
number,  or  directed  divergence  (see  Zacks  (1971) ;  it  is  defined 
by 


I(f;f0)  =  Ef  [log  -f] 

0 


=  lj(x)  log  dx 

0 

=  H(f ; f)  -  H(f ; f q) 


defining 

no 

H  (f ;  g)  -  /„ccf(x>  1°8  g(x)  dx  . 


It  has  the  properties:  1 ( f ; f  e )  >  0  and  I(f;f)  =  0  . 

The  average  directed  divergence  between  f  and  fQ  given  a 
sample  . Xn  is 


(£:te>  *  5  Ef 


f(X, 


i°g  t-  nr 
fe(Xi 


V 


f  (x. 


n  . Xn)l0g^T7 


-xn> 


'xn> 


dx. 


,  dx 


n 


I(f ; f0)  . 
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A  criterion  for  model  fitting  is  to  choose  f0  to  minimize 
I (f ; f 0)  or  an  estimator  of  I(f;fQ)  ;  an  estimator  would  be 

I(f;f0)  =  H (f ; f )  -  H(f;f0) 

if  f  were  a  non-parametric  estimator  of  f.  VJhile  F  is  a  natural 
non-parametric  estimator  of  F,  there  does  not  exist  a  natural 
non-parametric  estimator  of  f.  However  a  natural  non-parametric 
estimator  H(f;f0)  of  H(f;f0)  does  exist,  namely  the  average  log 
likelihood  Ln(0) ;  in  symbols, 

00 

H(f;f0)  =  /  log  f0(x)  dF (x) 

A  natural  estimator  H(f;f)  will  be  given  below.  Akaike  (1973) 

/v 

has  pioneered  in  emphasizing  that  to  find  0,  the  parameter  values 
0  which  minimize 

I (f ; f q)  =  H(f;f)  -  H(f;fe)  , 

it  is  not  necessary  to  know  H(f;f)  ;  one  need  only  choose  9  to 
maximize  Ln(0)  .  One  approach  to  measuring  how  well  the  maximum 
likelihood  model  f?  "matches"  the  data  would  be  to  measure  how 
significantly  different  from  zero  is  I(f;fg)  .  Other  approaches 
to  measuring  the  mathematical  fit  of  a  model  to  data  are  intro¬ 
duced  in  this  paper  using  various  representing  functions  of  the 
data  and  model  which  are  called  the  "raw"  and  "smooth"  repre¬ 
senting  functions  respectively.  One  of  our  goals  is  to  develop 
means  of  judging  goodness  of  fit  of  a  family  f0  of  probability 
densities  to  a  true  probability  density  before  forming 


-5- 


estimators  9  of  the  parameters. 

This  paper  discusses  the  increased  insight  to  be  obtained 
by  describing  the  probability  distribution  of  a  random  variable 
X  by  its  quantile  function  Q(u)  ,  0  <  u  <_  1  ,  and  density- 
quantile  function  fQ(u)  ,  0  <_  u  <_  1  .  Define 

Q(u)  =  F- 1  (u)  =  inf  (x-.F(x)  >  u} 
f Q (u)  =  f(Q(u))  . 

The  quantile- density  function  q(u)  ,  0  <_  u  <_  1  ,  is  the  derivative 
of  the  quantile  function: 

q(u)  =  Q' (u)  . 

The  score  function  is  (-1)  times  the  derivative  of  the  density- 
quantile  function: 

J(u)  =  -(fQ)'(u) 

An  important  identity  is 

fQ(u)  q (u)  =  1 

which  follows  by  differentiating  the  identity 

FQ(u)  =  u. 

We  can  now  give  an  example  of  the  advantages  of  thinking 
"quantile"  in  the  sense  of  thinking  in  terms  of  fQ(u)  rather 
than  f(x)  .  Two  measures  of  the  smoothness  of  a  function  are 
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the  integral  of  its  logarithm  and  the  integral  of  its  derivative 
squared.  Thus 

1  00 

/0  log  fQ(u)  =  f(x)  log  f(x)  dx  =  H(f ;  f) 

is  the  Shannon  information  measure  or  entropy  of  f,  while  the 
Fisher  information  measure  of  f  is 


Jq  |J(u)  |2du  =  /J  |fQ’(u)|2  du 


r1  f'Q(u)  2^,  _  r  |f’(x)i2 
in  fQtSr  d  "  ~¥TT) 


One  can  give  a  natural  estimator  of  entropy: 

1 

H(f ; f )  =  -  J  log  q(u)  du 
0 

where  q(u)  is  the  sample  quantile-density  defined  below.  We 
call  H  the  sample  entropy . 

The  density  quantile  function  as  a  function  of  interest 
for  itself  was  introduced  by  Parzen  (1979).  Tukey  (1965) 
pointed  out  the  significance  of  Q(u)  and  q(u)  under  the  names 
"representing"  function  and  "sparsity"  function.  A  review  of 
some  standard  approaches  to  statistical  modelling  is  given  by 
Ord  and  Patil  (1975)  . 
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2  .  Sample  Quantile  Function 

To  a  batch  of  data  one  can  define  a  sample  quantile  function 
Q(u) ,  0  £  u  £  1,  which  provides  a  "universal”  description  and 
summary  of  the  data.  However,  there  is  no  universally  accepted 
definition  of  Q(u) . 

Given  a  sample  X^  ,  Xn  ,  with  order  statistics 

X(l)  <  X(2)  <  •  •  •  <  X(n)  one  cou-*-^  define  Q  by 


Q(u  )  =  F~ 1 (u)  ,  0  <  u  <  1  ; 


then  Q  is  piecewise  constant, 

Q  (u)  =  X(j)  »  <uln’  j  =  . n 

One  often  prefers  a  piecewise-linear  definition  of  Q(u) ; 
then  one  defines 


Q(u)  -  X(j)  if  u  -  i 2hl->r  U, 


One  also  defines  values  for  u  =  0  or  1 ,  say  Q ( 0 )  =  and 

Q(l)  »  .  At  other  values  of  u  in  0  <  u  <  1  ,  one  defines 

Q(u)  by  linear  interpolation  of  its  values  at  the  grid  points 
(j •  0 . 5) /n  or  j/(n+l)  .  Then  Q(u)  is  differentiable,  and 
q(u)  =  Q'(u)  may  be  expressed  in  terms  of  the  sample  "spacings 

n  U(j+1)  -  X(j)}  . 

When  Q(u)  =  X(j)  at  u  =  (j-0.5)/n  ,  then  (j  =  1,  2 . n-1  ), 


q(u)  =  n  (X(.+1)  -  X(j)}  ,  <  u  < 


A  favorite  tool  of  statistical  data  analysis  is  the 
histogram  which  can  be  defined  as  a  piece-wise  constant  esti¬ 
mator  f (x)  of  the  density  function.  The  sample  quantile 
function  Q(u)  is  then  defined  as  the  inverse  of  the  sample 
distribution  function  F(x)  =  /*a f(y)  dy  .  The  insight  in  a 
histogram  seems  to  me  to  be  made  more  visible  by  plotting 
instead  the  histogram- quantile  function  f(Q(u)),  0  <  u  <  1 

A  raw  estimator  of  fQ(u) ,  called  a  raw  density-quantile 
function  and  denoted  fQ(u) ,  can  be  formed  from  the  reciprocal 
of  a  slightly  smoothed  estimator  of  q(u) ;  for  example,  one 
might  define 


2h 

f  Q  (u)  =  - - r - 

Q(u+2h)  -  Q(u-2h) 

The  sample  quantile  function  Q(u)  ,  0  <  a  ^  1,  is  a 
stochastic  process  (or  time  series)  whose  asymptotic  distri¬ 
bution  can  be  shown  to  satisfy  (under  suitable  assumptions  on 
f Q ;  sec  Csorgo  and  Revesz  (1978)). 

L 

{ /nfQ(u)  (Q(u)  -  Q(u)  }  ,  0  v  u  <  l}->  =  (B(u)  ,  0  <  u  <  1} 

where  (B(u)  ,  0  _<  u  1 }  ,  denotes  a  Brownian  Bridge  stochastic 
process  with  covariance  function 

E [ B (u-^)  B (U2)  ]  =  ui_d"u2)  for  0  1  <  U2  1  1  , 

=  denotes  "identically  distributed  as",  and  the  convergence  is 
in  the  sense  of  convergence  of  distribution  of  stochastic  processes. 
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The  asymptotic  distribution  of  the  sample  spacings,  and 
thus  of  q (u) ,  also  have  been  extensively  investigated  but  is 
difficult  to  summarize  briefly.  One  important  fact  is  that 
for  any  fixed  ,  . . . ,  uk 

fQ(u1)q(u1) . fQ(uk)q(uk) 

are  asymptotical ly  independent  and  distributed  as  an  exponential 
distribution  with  mean  1. 

The  difference  between  the  roles  of  distribtuion  functions 
and  quantile  functions  in  statistical  inference  is  made  clear 
by  considering  the  basic  goodness  of  fit  problem:  test  the 
hypothesis  Hq  , 

Hq  :  F  (x)  =  Fq  (x)  ,  -  «  <  °°  <  co  ; 

that  the  true  distribution  function  F(x)  equals  a  specified 

distribution  function  Fq(x) •  One  could  compare  the  sample 
distribution  F(x)  to  Fq(x)  (or  equivalently  test  whether  the 
transformed  random  variables  Fq(X^),  FqCX^)  are  uniformly 

distributed)  by  comparing  F(Qq(u))  to  u.  The  applicable 
asymptotic  distribution  theorem  is 

L 

{ /n{F(Q0 (u) )  -  u),  0  <  u  <  1}  =  { B (u)  ,  0  <  u  <  1} 

Alternately  one  could  compare  quantile  functions.  Instead 
of  comparing  Q(u)  to  Qq(u)  =  Fq  ^ (u) »  one  could  compare  the 


sample  quantile  function  of  Fq(X^),  FQ(Xn)  ,  which  equals 
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Fq(Q(u)),  to  u.  The  relevant  asymptotic  distribution  theorem 
is 

L 

{/n{F0(Q(u))  -  u}  ,  0  <  u  <_  l}->  =  { B  (u)  ,  0  £  u  <  1} 

The  problem  of  statistical  modeling  can  be  elegantly  defined 
in  terms  of  quantile  functions :  one  seeks  to  determine  distri¬ 
bution  functions  Fq(x)  such  that  Fq(Q(u))  is  not  significantly 
different  from  a  uniform  quantile  function  u.  Given  a  para¬ 
metric  family  of  distribution  functions  F0 (x)  an  optimal  esti¬ 
mator  0  of  0  could  be  defined  as  the  value  of  0  which  minimizes 
the  distance  ||Fq(Q(u))  -  u|  |  for  a  suitable  measure  of  distance 
between  functions  on  the  interval  0  to  1 . 

An  example  of  a  distance  is  the  conventional  L-2  distance 

llgi  -  g2 M 2  =  Jo  lgi<u)  "  e2(u>l2  du  • 

However,  one  would  like  to  choose  the  distance  so  that  the  estimator 

0  would  be  asymptotically  efficient.  Such  a  distance  is  pro¬ 
vided  by  the  reproducing  kernel  Hilbert  space  (RKHS)  norm  of 
the  covariance  kernel  of  the  Brownian  Bridge  stochastic  process; 
it  can  be  defined  over  any  sub- interval  0<_p<u£q<l: 

I  I gi“g2 1  I p  >  q  =  Jplgi<u>  '  g2(u)2du  +  £l8i<P>  "  g2  <P> I  2 

+  Irqlg1Cq)  -  g2(q)l2  . 

I  I sl“s2  *  I  0 , 1  =  Jolgi(u)  ■  §2(u)|2  du  ' 
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,-The  inner  product  is 

(gl.g2)p>q  =  /pg{(u)  &2(u)  du  +  l  gl(p)  g2(p) 

+  ITqgiCq)  g2(<l> 

A  minimum  distance  criterion  for  statistical  estimation 
of  the  parameters  0  of  a  parametric  family  FQ  of  distribution 
functions  is  to  choose  8  to  minimize 

||  Fq(Q(u))  -  u|| 2  =  I f e (Q(u) )  q(u)  -  1|2  du 

One  may  show  this  criterion  to  be  asymptotically  equivalent 
to  maximizing  likelihood,  or  minimizing  the  estimated  directed 
divergence  I(f,fQ)  : 

Ktfe)  =  O(x)  log  fig)  dx 

1 

=  -  f0  log  { f 0 (Q(u)  q (u) }  du 


-r 

-1 
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3 .  Location  and  Scale  Parameter  Models. 

An  important  parametric  model  for  a  distribution  function 
F  (x)  is 

F  (x)  -  Fq  (^) 

where  Fq  is  specified,  and  p  and  a  are  unknown  (location  and 
scale)  parameters  to  be  estimated.  Then 

Q(u)  =  p  +  o  Qq(u)  * 
q(u)  =  a  qQ(u)  , 

fQ(u)  =  |  f0Q0Xu)  • 

Two  important  choices  for  Fq  are: 

(1)  the  normal  or  Gaussian  case-. 

x 

F0(x)  -  o(x)  =  /_00^(y)  dy  , 

1  12 

f  0(x)  =  4>(x)  *  exp  -  2  x  ; 

/2tt 

(2)  the  exponential  case: 

F0(x)  =  l-e"x  ,  f 0 (x)  -  e'X  • 


-13- 


The  quantile  functions,  score  functions,  and  density- 
quantile  functions  of  some  standard  probability  laws  are  given 
in  the  Table.  Graphs  of  density-quantile  functions  are  given 
in  the  Figures. 

Because  of  the  way  that  fQ(u)  depends  on  u  and  o  ,  one 
can  introduce  functions  to  test  hypothesis  Hq :  Q(u)  =  u  +  aQg(u) 

which  do  not  require  estimation  of  jj  and  a  before  testing  the 
hypothesis.  Define 

1 

°0  -  J0  Wu>  q(u)  du  ’ 

d(u)  =  fo%(u>  <!<«)  > 

D(u)  =  Jq  d(u’)  du’  . 

We  call  D(u)  a  transformation  distribution  function,  and  d(u)  a 
transformation  density.  The  null  hypothesis  Hq  is  equivalent 
to 


D(u)  =  u  ,  d(u)  =  1  ,  0  £  u  £  1  . 

Given  an  estimator  D(u)  ,  G  <  u  <  1  ,  the  deviations  of  D(u) 
from  linearity  can  be  used  to  test  whether  a  sample  consists 
of  random  variables  satisfying  Hq,  or  consists  of  random 
variables  satisfying  Hq  plus  outliers.  Such  techniques  would 
be  useful  for  many  diverse  applications. 


QUANTILE  FUNCTIONS,  SCORE  FUNCTIONS,  AND  DENSITY  QUANTILE  FUNCTIONS 
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Figure  A 

Density  Quantile  Functions  fQ(u)  ,  0  <  u  <  1  of  some 

common  probability  distributions  Lognormal,  Logistic,  Normal, 

Cauchy,  Weibull  with  various  shape  parameters,  and  Extreme  Value. 


LOGNCPWWL 


ME  IB 
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4 .  Quantile  Based  Measures  of  Average,  Deviation,  Tail  Behavior , 
and  Modes. 

We  propose  that  the  sample  quantile  function  provides  a 
representing  function  for  the  sample  in  the  following  senses : 

(1)  Models  for  the  data  should  be  viewed  as  being  in 
one  to  one  correspondence  with  the  smooth  quantile 
functions  Q(u)  ,  0  ^  u  ^  1  which  are  their  representing 
functions , 

(2)  The  criteria  for  testing  whether  a  model  fits  the  data, 
should  be  based  on  measures  of  fit  between  the  repre- 
sentating  functions  Q(u)  and  Q(u)  , 

(3)  Since  the  sample  is  summarized  by  its  representing 
function  Q(u)  ,  any  descriptor  of  the  sample  should  be 
expressible  as  a  function  of  Q(u)  .  Similarly  any 
descriptor  of  the  distribution  of  X  should  be  expressible 
as  a  function  of  Q(u)  . 

There  are  four  characteristics  of  a  probability  distribution 
which  we  would  like  to  infer  from  the  data: 

(1)  location,  represented  by  a  measure  of  average; 

(2)  spread,  represented  by  a  measure  of  deviation  , 

(3)  tail  behavior,  represented  by  the  behavior  of  fQ(u)  as 
u  tends  to  0  and  1  , 

(4)  modality,  represented  by  the  number  of  modes  (relative 
maximum)  in  the  probability  density  or  in  the  density- 
quantile  function. 
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Location  and  spread  parameters  for  a  distribution  seem  to 
be  meaningful  only  when  it  is  unimodal ;  otherwise,  one  may 
want  to  find  an  associated  variable  whose  values  can  be  used  to 
divide  the  original  sample  of  X  values  into  two  or  more  samples, 
each  of  which  is  unimodal. 

A  parameter  representing  average  or  location  will  be  denoted 
by  u ;  it  could  be  defined  by  one  of  the  following  concepts: 

median  u  =  Q(0.5)  , 

mid-quartile  u  =  ^{Q(0.25)  +  Q ( 0 . 75) }  , 
mid-range  u  =  %{Q(0)  4-  Q ( 1 ) }  , 

mean  u  =  /q  Q(u)du  =  /“^x f(x)dx  . 

A  parameter  representing  deviation  or  spread  or  scale  will 
be  denoted  by  it  could  be  defined  by  one  of  the  following 
concepts : 

interquartile  range  a  =  Q(0.75)  -  Q(0.25)  , 

1 

Score  deviation  a  =  Jq  JQ(u)Q(u)du  with  score  function  Jq(u) 

standard  deviation  o  =  { /g  (Q(u)  -  /QQ(t)dt}^du}^ 

The  properties  of  fQ(u)  describe  the  tail  behavior,  modality, 
and  symmetry  of  the  distribution.  Indices  a-^  and  such  that 

al 

fQ(u)  v  u  as  u  — >  0  , 
a2 

(1-u)  as  u  — * 


fQ(u)  v 


1 
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may  be  rigorously  defined  when  they  exist  by 

lim  uJ (u)  _  lim  ( 1  -u) J (u) 

al  "  u  — >  0  fQ(u)  ’  a2  u  — »  1  fQ(u) 

We  call  a  ^  the  left  tail  exponent,  and  a  2  the  right  tail  ex¬ 
ponent  . 

The  tail  exponent  a  indicates  whether  the  tail  is  short, 
medium,  or  long:  a  <  1,  short;  a  =  1 ,  medium;  a  >  1,  long. 

The  Gaussian  distribution  has  =  a2  =  n  =  exPon“ 

ential  distribution  has  =  0  and  a =  1;  the  Cauchy  distri¬ 
bution  has  =  a2  =  a  =  2  ‘  '^ie  8raP^s  their  fQ  functions 
are  given  in  Figure  A.  Our  ideas  of  the  canonical  shapes  of 
distributions  seem  to  me  to  become  unified  when  they  are 
formulated  not  in  terms  of  the  shape  of  f(x)  but  in  terms  of 
the  shape  of  fQ(u) ;  for  example,  J  and  U-shaped  distributions 
correspond  to  a  <_  0  . 

When  fQ(u)  is  unimodal,  an  important  descriptor  is  the 
mode  percentile ,  denoted  Pmocje  *  It  is  defined  to  be  the 

value  of  u  at  which  fQ(u)  achieves  its  mode  (or  maximum  value) . 
The  value  of  Pmocje  and  its  relation  to  0.5,  is  a  quick  sum¬ 
mary  of  the  skewness  of  the  distribution. 

When  p-mode  _>  0.5  ,  the  distribution  is  conventionally 
described  as  being  skewed  to  the  left;  this  occurs  if  we  assume 
that  fO  satisfies  fQ(u)  <  fQ(l-u)  ,  0  <_  u  <_  0.5  ,  which  implies 
that  Q(u)  +  Q(l-u)  <  2Q(0.5)  ,  and  consequently  that 

mean  <  median  <  mode  . 
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Similarly  p-mode  £  0.5  (and  the  distribution  is  skewed  to 
the  right)  if  we  assume  that  fQ(u)  _>  fQ(l-u)  ,  0  <  u  <  0.5, 
which  implies  that  Q(u)  +  Q(l-u)  >_  2Q(0.5)  ,  and  consequently 
that 

mean  ^  median  >_  mode  . 

The  fact  that  a  density-quantile  function  is  always  defined 
on  the  unit  interval,  while  a  density  function  f(x)  is  defined 
on  an  infinite  interval,  seems  to  me  to  make  the  former  easier 
to  estimate. 
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5  •  Quantile  Box-Plots  and  Transform  Distribution  Functions 

The  most  dramatic  new  data-analvtic  tools  suggested  by  the 
quantile  and  density-quantile  approach  are  Quantile  Box  plots 
of  Q(u)  ,  0  <_  u  y  1  ,  and  plots  of  sample  transformation  dis¬ 
tribution  functions  o(u)  ,  0  <_  u  <_  1  . 

Quantile  box  plots  are  formed  of  the  original  data  and  of 
the  data  after  transformations  such  as  square  root,  logarithm, 
and  reciprocal.  They  provide  quick  procedures  for  estimating 
location,  scale,  and  shape.  A  quantile  box  plot  consists  of  a 
graph  of  a  quantile  function  on  which  is  superimposed  various 
boxes  with  vertices  (p,Q(p))  ,  (p,Q(l-p))  ,  ( 1 -p ,  Q(p>)  , 

(I-p,  Q(L-p))  which  we  call  a  p-box.  One  usually  chooses 
p  =  1/4  ,  1/8,  1/16  .  Within  the  quartile  box  (p  =  0.25),  one 
draws  a  median  line  with  vertices  (0.25,  Q(0.5)),  (0.75,  Q(0.5)). 
An  approximate  confidence  interval  for  the  median  Q(0.5)  is 
indicated  by  a  vertical  line  with  vertices  (0.5,  Q(0.5)  +  IQ//n) 
where  n  is  the  sample  size  and  IQ  =  Q(0.75)  -  Q(0.25)  is  the 
inter-quartile  range.  The  symmetry  of  the  distribution  is 
judged  by  the  symmetry  of  Q(u)  within  the  quartile  box. 

A  quantile  box  plot  is  an  extension  of  the  idea  of  a  box 
plot  introduced  by  Tukey  (1977)  . 

A  transformation  distribution  function,  or  cumulative 
weighted  spacings  function,  is  defined  by 

u  _ 

D(u)  =  / q  d(t)  dt  ,  0  <  u  <  1  , 

i  1 

d(u)  =  j-  fQQ0(u)  q(u)  ,  Oq  =  /q  f0Q0(u)  q(u)  du  . 


where 
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Its  pseudo-correlations  are  defined  by 


p(v)  =  /n  e“~  ^UV  d(u)  du 


v  =  0,  +1, 


The  asymptotic  distributional  properties  of  d(u)are  similar  to 
those  of  the  sample  spectral  density  of  a  stationary  time  series. 
Tests  of  Hq  could  be  based  on  Jq  log  d(u)  du  ;  the  deviation 

from  D(u)  =  u  of  D(u)  ;  the  deviation  from  p(v)  =  0  of 
p  (v)  ,  v  =  1 ,  2 ,  ...  . 

To  estimate  the  density-quantile  function  fQ(u)  ,  one  uses 
d(u)  to  form  smooth  estimators  d(u)  of  d(u)  Two  main 
approaches  are: 

(1)  kernel  method  --  for  a  suitable  kernel  K 

1  - 

dR(u)  =  J0  d (t)  K(u-t)  dt  ; 

(2)  autoregressive  method  --  for  a  suitable  order  m 


aru)  = 

m 


m 


i  + 


J  (1)  2TLU  + 

m 


a.  (m)  e 
m 


2  tt  ium  i  -  2 


where  a^,  am(j)  ,  j  =  1,  . m  are  determined  from  certain 
linear  equations  (Yule-Walker  equations)  in 


P(v>  =  J0 


^27riuv 

e 


d(u) 


du, 


v  =  0 ,  +  1 ,  ...,4-m. 


The  autoregressive  estimator,  including  procedures  for 
selecting  the  order  m,  are  implemented  in  a  computer  program 
ONESAM  whose  use  is  illustrated.  It  should  be  noted  that  choosing 
order  m  *  0  is  equivalent  to  accepting  Hq. 
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A  solution  to  the  important  problem  of  estimating  fQ(u) 
is  provided  by  the  "autoregressive"  estimator 

-  f0Q0(u)  iigd^u))-1 

Some  diagnostics  we  use  for  choosing  the  order  m  of  the 

autoregressive  estimator  fQ^Cu)  are  the  square  modulus  pseudo- 

2  2 
correlations  | p (v) ]  ;  the  residual  variances  a  (m)  ;  the 

Akaike  order  determining  criterion,  for  sample  size  T, 

AIC(m)  =  log  S2  +  ^  ; 

and  Parzen's  criterion 

CAT (m)  -  i  l  ST2  -  o"2 
T  j=l  J  m 

whose  shape  in  practice  is  similar  to  the  shape  of  AIC, 

Another  approach  to  estimating  fQ(u)  which  deserves 
investigation  is  to  estimate  log  fQ(u)  by  smoothing  -  log  q(u) 
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6 .  Examples 

To  illustrate  how  to  use  Q,  D,  and  fQ  in  statistical 
data  analysis,  let  us  consider  data  from  Tukey  (1977), 


p.  117  which  lists  seasonal  snowfall  in  Buffalo,  New  York 
and  Cairo,  Illinois,  from  1918-19  to  1937-38,  and 
asks  "What  light  do  these  two  batches  throw  on  how  they  should 
be  expressed."  To  answer  this  question  one  approach  might  be 
to  examine  the  quantile  box-plots  of  the  batches  (Figure  B) ; 
the  quartile  box  in  Buffalo  appears  symmetric  while  in  Cairo 
it  does  not.  One  might  attempt  a  transformation  of  the  Cairo 
data;  we  choose  the  square  root  and  conclude  that  its  quartile 
box  is  symmetric.  Does  this  prove  that  Buffalo  snowfall  and 
square  root  of  Cairo  snowfall  are  Gaussian? 

A  rigorous  approach  is  to  form  the  cumulative  weighted 
spacings  function 


Jn  <txT1(t)q(t)dt 

D  (u)  =  ^ - - 

Jg  0$  i(t)q(t)dt 


0  <  u  <  1 


whose  deviations  from  u  provide  a  test  of  Gaussian-ness  which 
does  not  first  require  estimation  of  u  and  c  (mean  and  standard 
deviation).  The  graphs  of  D(u)  in  Figure  C  indicate  clearly 
that  Buffalo  snowfall  and  square  root  of  Cairo  are  Gaussian, 
while  Cairo  snowfall  is  not  Gaussian. 

The  true  character  of  the  Cairo  snowfall  data  emerges  when 
one  estimates  its  fQ  function;  it  turns  out  to  be  bimodal,  which 
we  interpret  to  mean  that  there  are  two  kinds  of  snowfall  years 
in  Cairo,  Illinois  --  light  and  heavy. 
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Even  though  various  diagnostic  tests  of  the  square  roots 
of  Cairo  snowfall  data  indicate  that  it  is  Gaussian  the  order 
1  aut°regressive  estimator  of  the  density  quantile  indicates 
that  bimodality  is  a  possible  alternative  hypothesis. 
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Order 

Scatisti.cs 

Buf falo 

1 

25.0 

2 

39.8 

3 

46.7 

4 

49.1 

3 

49.6 

6 

51.6 

7 

53.5 

8 

54.7 

9 

60.3 

10 

63.6 

11 

64.8 

12 

69.4 

13 

71.8 

14 

72.9 

15 

79.0 

16 

79.6 

17 

80.7 

18 

81.6 

19 

83.6 

20 

103.9 

Mean  u 

64.1 

Q(0 . 25) 

49.6 

Q(0 . 50) 

64.2 

Q(0 . 75) 

79.6 

IQ 

30.0 

S.D.  5 

18.4 

Cairo 

Cairo  Squar 
Root 

0.4 

•  6325 

1.2 

1.0954 

1.6 

1.2649 

1.8 

1 .  341b 

2.7 

1.6432 

2.9 

1.7029 

3.0 

1 .7320 

4.0 

2.0000 

4.5 

2.1213 

5.4 

2 . 3238 

6.2 

2.4900 

6.8 

2.6077 

7.2 

2.6833 

7.4 

2.7203 

11.3 

3.3615 

11.5 

3.3912 

11.5 

3.3912 

12.4 

3.5214 

13.9 

3.7283 

14.1 

3.7550 

6.5 

2.375 

2.7 

1.6432 

5.8 

2.4069 

11.5 

3.3912 

8.8 

1.748 

4.5 

0.945 
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!  t>(v)  | 2 

Buffalo 

Cairo 

Cairo  Square  Root 

0 

1.0000 

1.0000 

1.0000 

1 

.0203 

.1391 

.0705 

2 

.0187 

.0192 

.0142 

3 

.0053 

.0215 

.0222 

4 

.0196 

.0496 

.0289 

5 

.0169 

.0317 

.0181 

<y 2  (m) 


0 

1.0000 

1.0000 

1.0000 

1 

.9797 

.8609 

.  9295 

2 

.9657 

.8199 

.8963 

3 

.9559 

.8179 

.8603 

4 

.9367 

.7883 

.8570 

5 

.9021 

.  7610 

.8404 

AIC(m) 


0 

.0000 

.0000 

.0000 

1 

.0795 

-.0497 

.0269 

2 

.1651 

.0014 

.0905 

3 

.2549 

.0990 

.1495 

4 

.3346 

.1622 

.2457 

5 

.3969 

.2269 

.3261 

Minimum 

At  m  3 

0 

1 

0 

CAT 

0 

-1.0000 

-1.0000 

-1.0000 

1 

-  .9212 

-1.0483 

-  .9709 

2 

-  .8369 

-  .9877 

-  .9028 

3 

-  .7497 

-  .8772 

-  .8373 

4 

-  .6718 

-  .8020 

-  .7361 

5 

-  .6076 

-  .7235 

-  .6504 

A 


0EN5 1 T  T -QUANT i tE  FUNCTION  NORMAL  CASE  ORDER  =  1  DENS1 TT-OUANT ILE  FUNCTION  NORMAL  CASE  ORDER 


D£N5 ITT -QUANTIlE  FUNCTION  NCRMRl 


SEASONAL  SNOWFALL  IN  -CAIRO  FROM  1918-1919  TO  1937-38 f INCHES) 
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7 .  Quantile  Simulation  and  Quantile  Boot  Strap. 

The  quantile  function  Q(u) ,  0  £  u  £  1  of  a  random  variable 
X  provides  a  way  of  simulating  a  random  sample  of  X.  Let  U 
denote  a  random  variable  uniform  on  0  to  1;  then  X  -  Q(U)  . 

Let  Un  be  independent  uniform  random  variables;  then 

L 

\ . Xn  =  Q(Ul),  ....  Q(Un)  . 

To  generate  a  random  sample  X^ ,  ....  X  ,  one  generates  n  random 

numbers  U,  ,  ....  U  and  transforms  them  to  X,  ,  ....  X 

i  n  1  '  n 

To  obtain  by  Monte  Carlo  methods  the  distribution  of  a 
statistic 

T  =  g(XL ,  . . . ,  Xn)  , 

one  would  generate  a  large  number  N  of  random  samples  X^ ,  ....  Xn ; 

generate  a  random  sample  T-^,  ...,  T^j  of  the  random  variable  T; 

and  finally  form  the  sample  quantile  function  QT(u)  which  provides 
an  estimator  of  the  true  quantile  function  of  T. 

When  the  quantile  function  Q(u)  of  X  is  not  known,  one  can 
estimate  it  by  the  sample  quantile  function  Q(u)  .  Now  from 
random  numbers  U-^ ,  ....  U^,  one  can  generate  "boot  strap"  simu¬ 

lated  values  [compare  Efron  (1978)  ] 

Xx  =  Q(U1) . Xn  =  Q(Un),  T  =  g(XL . Xn)  . 

One  can  generate  a  random  sample  T^,  T^  of  T  ,  whose  sample 

quantile  function  Q^,(u)  provides  an  estimator  of  the  true  quantile 


function  of  T. 
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Bivariate  distributions.  An  outstanding  problem  of 
statistics  is  the  simulation  of  multivariate  distributions. 
To  illustrate  the  quantile  approach  to  this  problem,  let 
(X-p  X2)  have  joint  distribution  function  F(x^,  X2) .  Denote 
the  marginal  distributions  functions  of  X-^  and  X2  by  F-^(x^) 
and  F2(x2).  Denote  the  quantile  functions  of  the  marginal 
distributions  by  Q^(u^)  and  •  Define 


D(u1,u2)  =  F  (Q-^  (u^)  ,  Q2(u2))  ; 

it  is  the  joint  distribution  function  of  the  "rank  transforms" 

U1  =  F1(X1)  '  U2  =  W  • 

One  can  generate  (X^^)  by  generating  from  the  dis¬ 

tribution  D(u1,u2>  and  then  forming 


X1  =  W  -  x2  =  Q2(U2)  • 


To  generate  (U^,  l^)  one  chooses  U-^  to  be  uniform  on 
0  to  1,  and  then  generates  by  the  conditional  distribution 
DU  |U  (u2iui^  or  *-ts  quantile  function  |y  (p|u^)  by  the 

formula  u2 1 =  u-j^  ^  (fy  jy  ^U2'ul^  where  U2  is  uniforTn 
on  0  to  1,  and  independent  of  .  The  conditional  quantile 
function  Qy  |y  (p| u^)  can  be  estimated  by  the  sample 


quantile  function  of  the  sample  values  U2  corresponding  to 
sample  values  near  u^  . 
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P.  Quantile  Formulation  of  Robust  Location  and  Scale  Estimators. 

Assume  a  location- scale  parameter  model  for  the  quantile 
function  of  a  continuous  random  variable  X:  Q(u)  =11  +  oQg(u) 

Assume  a  symmetric  distribution,  which  is  equivalent  to  Qg(u) 
being  an  odd  function  in  the  sense  that  Qq(I-u)  =  -Qq(u) 

Given  a  sample  X, ,  ....  X  ,  the  log  likelihood  function 

may  be  written  in  terms  of  the  sample  quantile  function  [compare 
Parzen  (1979a) ] : 


L  =  Mog  f(X1#  ....  Xn;y,a) 


X,-u 


1  n  •.  a.- 

«  i*l  106  0 


(1) 


=  -log  a  +  I" "l og  f0(— dF (x) 

=  -log  a  +  /g  log  fQ(^— ~~M)du 

^  ^  3  L 

The  maximum  likelihood  estimators  u  and  a  satisfy  =  0  and 

9L  =  0  .  An  important  role  in  these  equations  is  played  by  the 
9o 

Fisher  Score  function 

f0<*>  d 

*<x>  ■  -  ■  -  as  l08  fo(x)  :  (2) 

Between  ip  and  the  score  function  Jq(u)  =  -  (fgQg(u))'  ,  there 
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is  an  important  relation: 

Jq(u)  =  0(Qq(u) )  (3) 

The  maximum  likelihood  estimators  u  and  a  are  the  solutions  of 

/ q  t|>  — H)du  =  0  (4) 

o 


/ q  ^ ~  {Q(u)  -u  }du  =  o 

G 

Under  the  symmetry  assumption  one  seeks  robust  estimators 
of  location;  various  standard  estimators  may  be  heuristically 
motivated  by  approximating  (4)  in  suitable  ways. 

M-estimators  are  defined  by  introducing  a  window 

w(x)  =  |  iKx)  • 


Then  (4)  may  be  written 

/q  w(^~-^-)  {Q(u) -U  ;  du  =  0 

y 

/V  A 

To  estimate  u,  consider  the  limit  of  an  iterative  sequence  u 
defined  by 


(n) 


/U^u)';i.(--.)  Q(U)  du 

(n+1)  _  U  a  _ 

du 


(5) 


The  estimator  is  an  M-estimator.  For  w(x)  one  could  choose 

a  function  which  corresponds  to  Student's  t  distribution  with 
m  degrees  of  freedom  (see  Parzen  (1979a)]: 
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,•  x  m+ 1  1 

W(x)  =  - ■  —  —  -T  -T> 

m  -»  .  i  *- 
1+  x 
T* 


Various  widely  used  choices  of  vv <)  are  described  in  Hogg  (1^79)  . 
The  most  widely  used  choice  fc  w(x)  may  be  Tukey ’ s  bisquare 
window 


w(x>  =  (i-(*-)2>; 

where  c  is  a  suitable  constant,  often  chosen  as  6.  The  choice 
of  m  or  c  is  crucial;  it  should  reflect  one's  beliefs  about 
how  long  are  the  tails  of  Fq(x). 

L-estimators  are  linear  combinations  of  order  statistics 
which  can  be  written  in  terms  of  the  quantile  function  Q(u)  as 
follows,  for  a  suitable  weight  function  W  (u) : 


/ijQ(u)W  (u)du 


JJW  (u)du 


If  the  model  Q  =  y  +  c?Qq  is  assumed  to  hold,  with  a  symmetric 
Qq  ,  and  one  chooses 

Wu(u)  =  ilf '  (Qq  (u)  )  (7) 

A 

then  y  is  an  asymptotically  efficient  estimator  of  y.  A  rigorous 
derivation  of  (6)  and  of  (10)  can  be  obtained  from  equation  (3) 
of  section  9.  A  heuristic  derivation  of  (6)  from  (A)  is  obtained 
by  writing 


^(Q(u)-u_)  =  vpcQqCu) )  +  r  (Q0(u))(^(^-~-^  -  Q0(u)> 


(8) 
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Since  y(Qg(u))  and  Qq(u)  are  odd  functions,  and  '+i'(Qq(u))  is 
even,  the  estimation  equations  for  u  are 

0  =  du  =  /J,|,*(Q0(u)){2liiIlli}du  . 

o  a 

From  (9)  one  obtains  the  estimator  defined  by  (6)  and  (7)  . 

An  estimator  of  0  which  is  asymptotically  efficient  for  the 
model  Q  =  u+j  ,  when  is  a  symmetric  distribution,  is 

~  / jQ(u)Wa(u)du 

a  -  - — - -  (10) 

ioQ0(u)wa(u> du 

where 

Wo(u)  =  J0(u)  +  Qq(u)Wu(u)  .  (11) 

It  is  often  the  case  that  Wa(u)  is  approximately  equal  to  Jq(u) 
(times  a  constant  such  as  1  or  2)  .  This  helps  explain  why  the 
following  definition  works. 

Score  deviations  and  minimum  residual  score  deviation 
estimators .  It  is  a  remarkable  fact  that  one  can  define  a  uni¬ 
versal  (and  robust)  measure  of  deviation  of  a  sample: 

=  /o  f0Q0(u)  du  •  (12) 

Assuming  that  IqQq(u)  Q(u)  =  0  for  u  =  0,  1 ,  we  can  write  5q 
in  the  form 

o0  =  /q  Jq(u)  Q(u>  du  ’  (i3> 

which  we  call  a  sample  score  deviation.  To  calculate  it  one  has 
to  specify  a  score  function  Jq(u).  Note  that  Og  estimates  a 
population  quantity  defined  by 
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°o  =  f 0  J0^u')  du  (ia) 

which  we  call  a  score  deviation. 

Robust  estimators  of  location  called  R-estimators  can  be 
interpreted  as  minimum  "residual  score  deviation"  estimators. 
More  precisely  suppose  one  estimates  the  location  parameter  ;i 
by  an  estimator  u  whose  residuals  Q(u)-u  have  smallest  score 
deviation : 

/q  Jg(u)  (Q(u)  -  uldu  is  minimized; 

it  can  be  shown  that  this  is  precisely  the  definition  of 
R-estimators. 

M-estimators  can  also  be  motivated  from  this  point  of  view; 

to  avoid  specifying  Jq(u)  =  one  replaces 

Q0(u)-p 

it  by  i^( - - - )  and  the  criterion  to  estimate  p  is  to  minimize 

/J  «p(S<Hllli){Q(u)  -  jj  }  du  =  /J  w(^~^  — )  (Q(u)  -  a ) 2  du 

whose  solution  might  be  sought  as  the  limit  of  sequences 
of  the  form  of  (5) 

The  fact  that  R  and  M  estimators  of  u  can  be  formulated 
as  minimum  residual  score  deviation  estimators  seems  to  explain 
why  these  methods  can  be  extended  to  estimation  of  regression 
coefficients.  However  L  estimators  do  not  have  a  natural 
generalization  to  regression.  Further  R  and  M  estimators  yield 
asymptotically  equivalent  results  when  their  Jq(u)  and  41 
functions  satisfy  (3) . 


-40- 


9 .  Data  Summary  by  a  Few  Values  of  the  Sample  Quantile 
Function . 

To  form  an  estimated  quantile  function  Q(u) ,  the  simplest 
approach  is  to  first  attempt  to  fit  a  parametric  family  of  the 
form 


Q(u)  =  y  +  a  Qq(u)  (1) 

where  Qq(u)  is  a  specified  quantile  function;  y  and  o  are  called 
location  and  scale  parameters  since  F(x)  =  Fg  • 

/\  A 

One  seeks  to  form  estimators  u  and  a  which  are  asymptotically 
efficient  under  the  hypothesis  that  the  true  quantile  function 
satisfies  (1) . 

Some  of  the  aims  for  which  the  quantile  function  approach 
to  statistical  data  analysis  may  provide  rigorous,  yet  simple, 
methods  are  as  follows: 

1.  to  provide  approximately  efficient  estimators  of  u 
and  a  under  the  hypothesis  Hg :  Q(u)  =  y  +  a  Qq(u)  ; 

2.  to  perform  quick  goodness  of  fit  tests  of  Hg,  and/or 
to  find  re-expressions  (transformations)  of  the  data 
which  satisfy  Hg; 

3.  to  perform  rigorous  goodness  of  fit  tests  to  identify 
quantile  functions  Qq  for  which  the  data  satisfies 
Hg,  and/or  to  estimate  the  density-quantile  function. 
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Estimation  of  u  and  a  :  Efficient  and  tractable  estimators  of 
u  and  a  which  are  linear  functionals  in  Q(u)  can  be  found  using 
the  theory  of  regression  analysis  of  continuous  parameter  time 
series  developed  by  Parzen  (1961).  The  asymptotic  distribution 
theory  of  Q(u)  permits  us  to  write  approximately 

✓n  fQ(u)  (Q(u)  -  Q(u) }  =  B(u) 


Under  Hq , 

Q(u)  =  v  +  a  Qq (u)  ,  fQ(u)  =  ifQQ0(u)  . 

Consequently,  defining  Og  =  o//n  , 

f0Q0(u)  Q(u)  =  uf0Q0(u)  +  of0Q0(u)  Qq(u)  +  og  B(u)  .  (2) 

The  parameter  og  is  linearly  related  to  o,  but  it  is  here  treated 

as  a  free  parameter.  In  terms  of  the  inner  product  of  the  RK11S 

of  the  Brownian  Bridge  covariance  kernel  one  may  express  the 

^  <<■> 

minimum  variance  unbiased  estimators  p  and  a  given  Q(u)  for 
0  £  u  5  1,  or  p  £  u  <  q,  or  u  =  u^  as  follows 

\  .  /<foV  \ 

blnf'1  (3) 

/  \«f0w  <w$>/ 


where  Inf  is  the  Information  Matrix, 

/<f0Q0.  foQ0  >  <f0Q0’  (f0(W  \ 

Inf  =(  1  (4) 

y<Qo(fo^O^’  f0^0^  «WV  ^ 0 

■  ~  2-1 
The  variance-covariance  matrix  of  u,  o  equals  {Inf}  . 

D 

It  should  be  emphasized  that  the  foregoing  expressions  are 
not  valid  if  f~Qg(u)  and  (fgQg(u))  Qg(u)  do  not  belong  to  the 
RKHS,  which  can  happen  in  the  case  of  the  index  set  0  <_  u  1 . 
Failure  to  belong  to  the  RKHS  seems  to  be  equivalent  to  the 
optimal  parameter  estimator  involving  a  few  extreme  value  order 
statistics,  which  implies  that  the  estimators  are  not  asymptotically 
normal . 

If  one  could  accomplish  these  aims  using  a  "few"  (say,  7) 
selected  order  statistics,  then  one  could  regard  these  "few" 
order  statistics  as  an  efficient  summary  of  the  entire  sample  of 
size  n.  If  large  samples  (as  well  as  small  samples)  could  be 
effectively  represented  by  a  small  number  of  order  statistics, 
then  every  data  set  could  be  published  and  each  reader  could 
easily  do  "hands  on"  statistical  data  analysis. 

The  problem  of  choosing  order  statistics  for  the  estimation 
of  location  and  scale  parameters  has  an  extensive  literature. 

The  density-quantile  approach  has  been  investigated  by  Eubank 
(1979)  in  his  Ph.D.  thesis.  By  using  location  and  scale  estim¬ 
ators  based  on  only  7  quantile  values  for  a  specified  Qq,  one 
can  identify  19  quantile  values  which  are  the  union  of  these  7 
values  over  a  large  number  of  familiar  choices  of  Qq.  The  proposed 
19  number  universal  data  summary  consists  of  the  median  Q (0 . 5) ; 
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the  j/16  percentiles  QO/16  .  Q<(8+j)/16)  for  j  -  7. 6, 5 ,4. 3, 2, 1  • 
and  the  .01  and  .02  percentiles  Q(0.01),  Q(0. 02) .  Q(0.98), 

Q(0.99)  .  „e  reproduce  Tables  31  and  32  of  Eubank’s  thesis  which 
shows  which  of  these  order  statistics  are  used  to  estate  lo¬ 
cation  and  scale  parameters  of  familiar  probability  laws. 
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Tnble  3  1  .  Order  *  t  ;  i  L  I : ;  I  le  Select  Ion  for  location 
l’.iramrt  or  Estimation  !>y  Seven  Order 
Statistics 


_ Disti ibntion _ 

Spacing  Normal  Cauchy  Logistic  Extreme  Value 


.01 

.02 

.0625 

.125 

.875 

.25 

.3125 

.375 

.6375 

.5 

.5625 

.625 

.6875 

.75 

.8125 

.875 

.9375 

.98 

.99 


/ 

/  /  / 

/  / 

/ 

/  / 

/  /  / 

'  ./ 

/ 

/  / 

/  /  / 

/ 


/ 

/ 

/ 

/ 

/ 


/ 


/ 
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